Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-01 / neuron.zip / THEORY.DOC < prev next >

Wrap

Text File | 1992-07-06 | 13KB | 242 lines

E.FARHI neural networks THEORY.DOC 06/1992. **************************************************************************** LET'S NOW SEE HOW IT WORKS **************************************************************************** I don't intend to write a thesis about it, but here are some important points in order to understand network theory. II) AN INTRODUCTION TO NEURAL NETWORKS ************************************** II.1) Where does it come from ?. ------------------------------- Nature is certainly what's best on earth. So man is trying to imitate it. We thought about copying neural systems for computing and electronic applications because we were not very satisfied of what a usual computer can do (just think about hand-writting recognition and space optimisation problems). In man brain, which is composed with billions of neurons, each neuron receives some nerve impulses from his dendrites, and emits a pulse through the axone. Some neurons are more likely to receive signals (vision, hearing,...) and others lead to an action (move, idea,...). A formal neural network is just a mathematical model of this. II.2) And now, let me introduce you to the neuron. -------------------------------------------------- The formal neuron is a tiny little thing, that receives some inputs, then computes its synaptic potential, and then emits its state to its friends. I1 --(W1)--\ 'Wi' is the synaptic link (weight). . . \ 'Ii' is the neuron input Ii --(Wi)-->(S)--> A . . / In --(Wn)--/ Usually we use a linear potential: S=Sum(Ii*Wi)+B where 'B' is a constant for each neuron ('bias'). The output ('activity') is: A=f(S) where f is either a Heaviside type function or a sigmoid type function. ^(A) ^(A) | (+1)| _____ (+1)|-------- | .-- | |/ -------------+---------->(S) -----------+--------------->(S) | /| --------|(-1) ____--' | | |(-1) Heaviside Sigmoid You see that there is a high (often +1) and a low (why not -1 ?) state for the neuron. In fact the sigmoid curve is linked to a constant called 'temperature' by analogy to thermodynamic systems, and the Heaviside case corresponds to T=0. It is also possible to associate a probability to those curves. In that case, the only possible states are high and low (without any middle state such as for the sigmoid). You may have guessed (who knows ?) that it is called the 'stochastic' model. II.3) How the net works... -------------------------- It would not have been very interessant to stop here. So, with some neurons, we build a network. To do that, it is just necessary to define links between neurons. To simplify, we group neurons into different layers. If neurons of a single layer are linked together, it is a 'Hopfield' type layer, else it's a 'Preceptron' type layer. The final network is formed of various different layers... __________ / \ (1) ... (n) (1)--(2) ... (n) / | \ / | \ | | | (1) ... (i) ... (n') / | \ / | \ / | \ Preceptron Hopfield If we use more than one Preceptron layer, we obtain a ... multi-preceptron network. This structure is often used for hand-writting recognition and some expert-systems. The Hopfield structure can learn less prototypes, and is a dynamic system, evoluting towards a stable state. It can also be used as an auto-associative memory, but isn't very efficient (to tell you the truth). There are many other types of networks such as: Kohonen networks, Boltzmann machines, auto-feedback model, etc...but, don't worry I won't talk about them in this doc ! If you really want to know something about them, I invite you to read some of the books in the bibliography list. II.4) Let's go to school ------------------------ If you want to use your network, you need to teach it something nice. There are two main algorithms used for learning. A Hopfield network can learn directly with the Hebb rule, and a multi-layer learns better with the back-propagation rule (an extension of the Widrow-Hoff rule). Now, let's consider a Hopfield network. You now know that it has just a single connected layer (say, 'n' neurons). Suppose you want to teach him to memorize a prototype, composed of 'n' data (usually binary data, +1 or -1). In that case a prototype is just an input. The output you want to teach to the net IS the input. You just make: Wij(t+1)=Wij(t)+m*Pi*Pj (Hebb rule) where Pi,Pj are the inputs of the prototype associated to the neurons i and j. m>0 is a constant or, better, the boolean value (i<>j). That procedure has to be repeated for each prototype, as long as the network hasn't memorized it, that is to say till the prototype isn't a stable state for the net (don't forget that such a net has its own dynamics). For a Hopfield net, the prototypes can be memorized if they are orthogonal together, and the net is stable if Sign(Sum(Wij*Pj))=Sign(Pi). For a multi-layer network, it's a little more complicated. Basically, we set the input of the net, then compute the global state, compare the output to the one expected (we are teaching to the net an input-output prototype) then reakon for each output neuron a local error. The 'error gradient back-propagation' algorithm consists in calculating for each neuron the local error from the output towards the input, and changing the synaptic links. To apply this method, we need a sigmoid type neuron. In fact we try to reduce the quadratic error: QE=Sum((Ai-Oi)^2)/2 where Oi is the expected (reference) output. Let's consider a net formed with the layers 'i','j' and 'k'.The local error for an output neuron is: Ek=(Ak-Ok)*f'(Sk) 'k' index is for the output layer. where f' is the derivative of the sigmoid. For an intermediate layer (index 'j') the error is: Ej=Sum(Wjk*Ek)*f'(Sj) The link Wij is then modified: Wij=Wij-m*Ej*Ai (m>0) NB: we have also Wjk=Wjk-a*Ek*Aj. and a similar relation for the bias B. Typically, for a three layers net, 'k' is the output, 'i' the input an 'j' is a 'hidden' layer. Anyway for every network, we can define some statistical variables: E=-Sum(Wij*Ai*Aj)/2 Global network energy. W=Sum(Wij)/N Mean synaptic link. QW=Sum(Wij^2)/N Quadractic link. II.5) What's up doc ? (this is the most interesting) --------------------- All right. Now your network has learned some prototypes. We are going to see a few properties of your net. What a net ! The memorization is performed thanks to the synaptic links. So the information isn't localized (such as for a logic computer). You can even partially destroy the net without loosing the information ! (but of course if you remove too many neurons, the output will have nothing common with what you expect). This explains why the human brain looses daily thousands of neurons, but you don't feel it untill you're very old ! Memorization capacity. Anyway, don't be too hopeful, your net has a limited memorization capacity. If you make it learn too many things, it may forget everything in a big mess ! In fact, the number of prototypes it can memorize depends of the number of neurons and links. To quantify this, we define alpha=P/C where P is the number of prototypes and C is the mean number of connections by neuron. The thermodynamic study shows that for a Hopfield network, there is a critical value: alphaC # 0.14 . If alpha<alphaC the prototypes are memorized. For alpha=alphaC appends a first order transition. And if alpha>alphaC the network saturates and forgets everything. This study has been done by analogy with a magnetic spin system. Usually, a multi-layer network has a critical alpha between 1 and 2, and we can avoid the net overflow limiting variations of synaptic links; the net then forgets the oldest prototypes progressively. So, a Hopfield net can't learn too many things, but a multi-layer one is much more efficient. Generalization capacity The network knows what you haven't told it ! In fact it can predict and extrapolate an output for an input it doesn't know. This is why you can use it for caracter recognition. It is called an auto-associative memory. It has been shown that the more a net knows its prototypes, the less it can generalize, so you have to find a compromise. I usually use a 5 % to 10 % error rate. How does it really work ? Basically, a Preceptron layer is a classifier. It defines a partition of prototypes into 2^n classes limited by hyperplanes into the solutions space. Inside the hidden layers, the net builds an internal representation of each prototype. Studying those layers shows how the network memorizes and organizes its information. For a Hopfield layer, it different. The memorized prototypes are in fact the mimima of the energy function. That's why we can use such a net for a function optimization problems and shapes recognition. II.6) What you can do with it ----------------------------- Here are some particular applications of neural networks. Of course, there are many others... Parity recognition We use a three precepton layer network input 8 neurons from 0 to 255. hidden 8 neurons output 1 neuron -1=odd,+1=even. You teach the right parity for some numbers (for instance 1 to 15) and the net will extrapolate for other inputs. Data compression If you want to compress pictures or sound, you can use a three precepton layer network: input n1 neurons a bit of data hidden n2 neurons n2<n1 -> compression output n1 neurons input=output for learning. Inside the hidden layer, a compression (with possibly some losses) is performed because n2<n1. To code the picture, for instance, you just need to save the internal representation of the hidden layer. To decode the compressed image, you set the state of the hidden layer to the saved state. Function optimisation Suppose you've got a problem in which you need to find the extrema of a function - by example, the traveling salesman problem - you can then use a Hopfield network which energy function is the function you are studying. The only difficulty is to find the link variations you need to impose. E=-Sum(Wij*Ai*Aj) global energy. Si=Sum(Wij*Aj) we assume B=0 here. Ai=f(Si) f: Heaviside or sigmoid. Of course, you know the Hebb rule and your expression of E, function to optimize. You need to derivate E to find a condition on Wij. I let you the computation... Caracter and shape recognition. You can use a Hopfield network, but you'd better try it with a multi-layer network (more efficient, better capacity). You just need to teach the net different types of digitalized caracters (for instance for a 10*14 caracter you need 140 input neurons). For the output, you associate a neuron to each caracter (for instance 26 neurons for one alphabet). Possibly you can use a single layer network (an input and an output). It's even better if you process the picture before sending it to the net (thinning, removing isolated pixels,...). (next to read: NEURON.DOC and NETWORK.DOC)